Clustering Algorithms for Huge Datasets: A Mathematical Approach

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient algorithms for exact hierarchical clustering of huge datasets: Tackling the entire protein space

Motivation: UPGMA (average-linkage clustering) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. UPGMA however, is a complete-linkage method, in the sense that all edges between data points are needed in memory. Due to this prohibitive memory requirement UPGMA is not scalable for very large datasets. Results: We present novel memory-co...

متن کامل

Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

MOTIVATION UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. APPLICATION We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any pr...

متن کامل

Local Cluster Analysis: A New Approach for Evaluating Different Document Clustering Algorithms by Huge Corpora

Evaluating different clustering algorithms are one of the main challenges in document clustering. For different purposes different clusters may highlight valuable aspects of the texts. Furthermore, standard corpora that are created for evaluating different clustering algorithms compare with standard IR corpora are small because so huge amount of human judgment is needed. We introduce a simple a...

متن کامل

Patch Relational Neural Gas - Clustering of Huge Dissimilarity Datasets

Clustering constitutes an ubiquitous problem when dealing with huge data sets for data compression, visualization, or preprocessing. Prototype-based neural methods such as neural gas or the self-organizing map offer an intuitive and fast variant which represents data by means of typical representatives, thereby running in linear time. Recently, an extension of these methods towards relational c...

متن کامل

Clustering Algorithms Optimizer: A Framework for Large Datasets

Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic proced...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Applications

سال: 2019

ISSN: 0975-8887

DOI: 10.5120/ijca2019918724